Usability of German hospital administrative claims data for healthcare research: General assessment and use case of multiple myeloma in Munich university hospital in 2015–2017

Objectives To assess the usability of German hospital administrative claims data (GHACD) to determine inpatient management patterns, healthcare resource utilization, and quality-of-care in patients with multiple myeloma (PwMM). Methods Based on German tertiary hospital’s claims data (2015–2017), PwMM aged >18 years were included if they had an International Classification of Diseases, Tenth Revision, code of C90.0 or received anti-MM therapy. Subgroup analysis was performed on stem cell transplantation (SCT) patients. Results Of 230 PwMM, 59.1% were men; 56.1% were aged ≥65 years. Hypertension and infections were present in 50% and 67.0%, respectively. Seventy percent of PwMM received combination therapy. Innovative drugs such as bortezomib and lenalidomide were given to 36.1% and 10.9% of the patients, respectively. Mean number of admissions and mean hospitalization length/patient were 3.69 (standard deviation (SD) 2.71 (1–16)) and 12.52 (SD 9.55 (1–68.5)) days, respectively. In-hospital mortality was recorded in 12.2%. Seventy-two percent of SCT patients (n = 88) were aged ≤65 years, 22.7% required second transplantation, and 89.8% received platelet transfusion at a mean of 1.42(SD 0.63 (1–3)). Conclusion GHACD provided relevant information essential for healthcare studies about PwMM from routine care settings. Data fundamental for quality-of-care assessment were also captured.

Introduction Secondary data sources have been increasingly used in health services research over the past few years [1][2][3]. Administrative medical data, also known as "claims data" are an example of secondary data collected for purposes other than scientific research [4]. The literature describes secondary databases as appropriate to answering healthcare questions from different perspectives such as healthcare providers and payers [5,6]. Secondary databases also reflect upon healthcare utilization and treatment patterns in routine care settings [5][6][7]. They have been extensively used to address concerns related to, e.g., epidemiology, drug safety and effectiveness, treatment patterns, and impact of healthcare policies and quality of care assessment [8][9][10][11][12]. They include large study populations and special subpopulations that are difficult to recruit for prospective observational studies, rendering them as potential convenient alternatives for health research studies [5].
In Germany, different administrative claims data sources are available for research e.g. hospital-based administrative claims data like the data collected in the format of §21 data set [13], statutory claims data, data from the office-based physicians association and data from federal databases [5,9,[14][15][16][17]. All databases differ slightly from each other in terms of the data granularity and extent of data variables they contain and at certain points these databases might be complementary. The use of claims data in health services research has increased in Germany over the past decade [5,18]. However, most of such studies have involved claims data from German statutory health insurance (SHI) databases [8,15,16]. Despite the fact that singlepayer and SHI databases include a wide range of information important for health services studies from cross-sector outpatients and emergency departments visits, they are limited on variables required for quality-of-care assessment from inpatient all-payer aspects [19]. On the other hand, hospital administrative claims data provide more information about all therapeutics and diagnostic interventions that are reimbursed beyond the DRG system from inpatient perspectives irrespective of their health insurance. Thus, they could serve as a better alternative for quality-of-care assessment and benchmark evaluation studies from inpatient all-payer perspectives. Moreover, when combined to other data; e.g., SHI claims data, and patient medical record; they could provide a comprehensive evaluation of any predefined disease condition from real-world settings.
For rare conditions such as multiple myeloma (MM), secondary databases can serve as ideal sources for evidence concerning management patterns and healthcare resource utilization from routine care settings. MM is an incurable disease of plasma cells primarily affecting older people [1,3,20,21]. In Germany, MM is the third most common hematologic neoplasm after leukemia and non-Hodgkin lymphoma [22]. The treatment landscape for MM is continuously changing over time, although it is primarily directed at providing symptomatic relief, controlling the disease, and increasing the overall survival of patients [1,[23][24][25]. Despite the improvement in the overall survival of patients with MM using novel agents, MM poses an economic burden that must be evaluated and addressed within routine care settings [2]. Previous studies on MM in Germany conducted using real-world data were based primarily on surveys [26,27] and patient charts [28]. Depending on the quality and granularity of German hospital databases, they may provide valuable information for epidemiologic and health services studies, benchmark evaluations, and quality of care assessments. To the best of our knowledge, there is yet no study in Germany on the use of hospital administrative claims databases in health services research to address MM-related issues.
This study was conducted to assess the usability of German hospital administrative claims data to determine inpatient management patterns, healthcare resource utilization, and qualityof-care. As a use-case we referred on patients suffering from MM.

Study design and data source
All analyses were conducted on the basis of hospital administrative claims data of the Ludwig Maximilian University hospital, a tertiary university hospital with a specialized hematologyoncology department containing a specialized center to treat MM cases.
Claims data in German hospitals are collected using a uniform structure of §21 dataset [13], which is a performance and flat-rate dataset based on the German diagnosis-related group (G-DRG) system and the International Classification of Diseases, Tenth Revision, German Modification (ICD-10-GM) system. It contains information on all health services (e.g., diagnostic and therapeutic procedures reimbursed beyond the DRG system) provided to patients during their hospitalization irrespective of their health insurance. We obtained hospital administrative claims data based on §21 dataset structure from the KUM for the period 2015-2017. The dataset contains information on patients' identifiers, case number, pay area (e.g., DRG, additional fees, fees for novel interventions), health insurance ID, demographics (age, gender), reason for admission (primary diagnosis vs. secondary diagnosis), admitting department, diagnosis code, localization of diagnosis, procedure codes, date of procedure, admission and discharge dates and reason for discharge/transfer. The hospital administrative claims dataset was anonymized by the Trust Center and processed by the Medical Data Integration Center, both of which are located at Ludwig Maximilians University (LMU) and KUM inside the Data Integration for Future Medicine (DIFUTURE) consortia of the Medical Informatics Initiative (MII) that is funded by the German Federal Ministry of Education and Research (BMBF). In this context, the hospital administrative claims dataset was used for MII's national, cross-consortia demonstrator study after obtaining approval from the Ethical Review Board of LMU's Faculty of Medicine and KUM's Data Protection Officer. We followed the RECORD checklist to construct this manuscript [29].
Inclusion criteria. The study sample composed of patients with multiple myeloma with inpatient records during the period of 2015-2017. Patients aged >18 years were included if they fulfilled at least one of the following conditions: (1) at least one inpatient MM diagnosis (ICD-10 = C90.00 and C90.01) as the primary reason for hospitalization or (2) received anti-MM therapy. The ICD-10 code for identifying patients with MM was validated elsewhere [30].

Outcome measures
We began by evaluating data availability in the hospital administrative claims dataset. To compile a list of necessary data elements, we performed a narrative literature review of papers that investigated MM using administrative claims data. We evaluated research questions, methods, data required to answer each research question, and prominent findings in the identified reports. Finally, we evaluated the presence of each one of these elements in the hospital administrative claims dataset. We used a list of the required procedure codes (OPS-codes) and specific ICD-10 codes to identify medications used, procedures performed, and diseases diagnosed (S1 Table). This list should represent almost a complete variable list required to answer healthcare research questions. It should also serve as a blueprint for future studies aiming at linking multiple secondary data sources by providing the sources for each data variable. We used this list to identify the extent of data variables present in our dataset.
After identifying data elements present in the hospital administrative claims dataset, we conducted a specific analysis to evaluate the comprehensiveness and usability of such data elements. First, we examined the demographic characteristics of patients with MM, including age and sex. Second, we examined their clinical characteristics in terms of disease stage and severity, comorbid conditions, disease-and/or treatment-related complications, and in-hospital mortality. Third, we examined management patterns in terms of prescribed medications, line of therapy, and therapeutic and diagnostic procedures. Anti-MM therapy included administration of bortezomib (OPS-code = 6-001.9), lenalidomide (OPS-code = 6-003.g), or combination therapy (OPS-codes = 8-542, 8-543, and 8-544). Finally, healthcare utilization in terms of health resources consumption was defined as the number of readmissions that lasted >24 h, length of hospitalization, and therapeutic and diagnostic procedures performed.
We conducted a subgroup analysis on SCT patients because it served as a homogeneous subgroup of patients with MM, and an index date from the start of the procedure could be set. We assessed the possibility of evaluating these patients' clinical characteristics in terms of complications after SCT, management pattern in terms of treatment received, and reason for hospitalization after the procedure.

Statistical analysis
Categorical variables were presented descriptively as counts and percentages. Continuous variables were presented as mean and standard deviation (SD). Statistical analyses were conducted using the SAS 9.4 software (X64 10HOME platform, Copyright (c) 2002-2012 by SAS Institute Inc., Cary, NC, USA). A sunburst chart was produced using Rstudio 3.6.1 (Version 1.2.500 © 2009-2019, Inc.).

Results
The hospital administrative claims dataset contained variables required for case identification and evaluation of age and sex distribution among patients with MM (Table 1). It included some but not all information required to evaluate patients' clinical characteristics. It contained variables required for identifying possible comorbid conditions and disease-and/or treatment-related complications based on ICD-10 codes. It also included information for identifying in-hospital mortality under a variable termed "discharge/transfer reason." The diagnosis date was not recorded in the dataset thus we could not maintain the same follow-up period for all the patients. It also limited our ability to rigorously evaluate the chronological sequence of events and distinguish between unrelated comorbid conditions, and disease-and/or treatment-related complications. In other words, we could not set an index date from disease onset and follow up patients' clinical history over time to identify the occurrence and development of other conditions or complications. Moreover, details regarding laboratory and radiological findings and disease stage and severity were unavailable, hindering the evaluation of disease stage and severity as well as disease risk assessment.
For evaluating management patterns, we could identify prescribed medications and diagnostic and therapeutic procedures performed using pre-specified procedure codes. This approach allowed us to evaluate the treatment provided and the diagnostic and therapeutic procedures performed in terms of documentation frequency. However, because of missing diagnosis date, we could not construct a line of therapy. Treatment initiation date, therapy duration and dose, and evidence of treatment discontinuation/switching were not recorded, limiting the appropriate evaluation of treatment patterns.
The hospital administrative claims dataset contained information for assessing healthcare utilization in terms of the number of hospital admissions and length of hospital stays. However, the dataset was limited to a single hospital department, and no data on outpatient and emergency department visits were available. Therefore, admissions to other departments within the same hospital were not captured in the dataset.

Description of study sample
We identified 325 patients with a MM diagnosis code, of whom 222 (68.3%) were admitted with MM as the primary reason for admission. An additional eight patients who received anti-MM therapy but were not admitted primarily for MM were included. Overall, 230 patients with MM were included in the study.
Combination therapy was administered to 70.4% of patients. Bortezomib (36.1%) and lenalidomide (10.9%) were most frequently administered to the patients, whereas 38.3% underwent SCT. Blood transfusion (71.3%) was the most frequent therapeutic modality ( Table 2). Computed tomography (81.3%) and pulmonary function test (52.3%) were the most frequent diagnostic modalities. Patients were admitted with a mean of 3.69 (SD = 2.71) times and a mean duration of each hospital stay of 12.52 (SD = 9.55) days (Table 3).
In the subgroup analysis, procedure date was set as an index date, and patients were followed up prospectively over time. Among patients with SCT (n = 88), 71.6% were aged �65 years, with a mean age of 58 years during the first SCT ( Table 4). The first SCT was performed after a mean of 98.5 (SD = 83) days from their first recorded admission. Sixty-seven (76.1%) patients underwent a single SCT, and 1 patient received three SCTs during the 3-year study period. Regarding possible disease-and/or treatment-related complications, neutropenia (100%), thrombocytopenia (87.5%), and infection (78.4%) were the most frequent (Table 4). After SCT, 27 (30.7%) patients were readmitted at least once, with MM (56.9%) being the primary reason, followed by other tumors (21.7%; Fig 1). For the first three post-SCT readmissions, combination therapy (100%) and blood transfusions (96.3%-100%) were frequent ( Fig  2). Bortezomib (40.7%) and lenalidomide (14.8%) were used post-SCT.

Discussion
So far very limited information on the usability of German hospital administrative claims data to evaluate patient routine care in complex and rare haematological conditions such as MM has been published. Subsequently, the context of the German federal Medical informatics initiative (MII) and its workstream data integration for future medicine (DiFUTURE) raised the concern on what information the routinely collected data (e.g. the hospital administrative claims data) provides to answer research questions. To our knowledge, the MM use-case provides for the first-time information on inpatient management patterns, resource utilization in a German tertiary teaching hospital. Such information might be used for different purposes like benchmark evaluation and quality assessment.
The hospital administrative claims data contained the data variables that allow the evaluation of demographics, clinical characteristics, management pattern and resource utilization during inpatient stays. Data variables recorded in the used data set are age, gender, ICD-10 codes to identify comorbidities and treatment-related complications, OPS-code, start date of the procedure, admission and discharge dates, admitting department, reason for admissions, and reason for discharge (transfer to other department/hospital, death, end of treatment course). It was possible to identify MM cases and determine their basic demographic and clinical characteristics. Their treatment patterns and healthcare resource utilization were also evaluated. The availability of information on reimbursed interventions enabled identifying a subgroup of patients who underwent SCT and evaluating their complications and treatment post-SCT from inpatient all-payers perspective. The hospital administrative claims data did not include clinical details such as dates of diagnosis, disease stage, disease severity, response to treatment or laboratory results. Therefore, comprehensive evaluation of MM care from the disease onset was not feasible using the used data set. However, information on disease onset could be drawn from other administrative claims data such as the health insurance databases   or patient medical records while clinical details on disease severity, response to treatment and laboratory results are better extracted from patient medical records. Our study's basic descriptive results align with previous studies that used administrative claims data in terms of the number of patients with MM and their demographics [1,2,[22][23][24][25][26], further confirming the availability of data required for case identification and demographic evaluation in the hospital administrative claims dataset. Moreover, the standard of care that involves SCT and administration of novel therapeutic agents such as bortezomib and lenalidomide was integrated into the MM treatment landscape at the KUM. However, the transplantation rate recorded in the hospital administrative claims dataset was higher (38.3% of 230) than that reported by Song et al. [1] (16.2% of 24,507 patients) but consistent with the findings of Rifkin et al. (34% of 1450 patients) [31]. Although we could not construct lines of therapy, we could identify a subset of SCT patients who underwent more than one transplantation. The proportion of patients undergoing a second SCT was slightly higher than that reported by Ashcroft et al. (7% of 337 patients) [32]. A possible explanation for this discrepancy is that our data reflect only on patients with MM hospitalized during the study period because of complications or requiring invasive interventions such as SCT, making them different from patients in outpatient departments or even in other healthcare facilities. Moreover, KUM is a tertiary hospital with a highly specialized hematology-oncology center that receives referrals from other healthcare facilities in Bavaria. Where platelets are considered valuable resources and are not always readily available in the transfusion centers [33], around 90% of our SCT patients received platelet transfusion post-SCT. Such finding indicates the need for a more in-depth evaluation to assess the burden that such an intervention could pose on healthcare facilities and the patients from a health economic perspective.
Although the hospital administrative claims dataset could be used to identify cases and some health-related events, relying on it exclusively for a research study presents several problems. First, its use was primarily restricted to coded health events and interventions during inpatient stays and, therefore, subject to coding comprehensiveness and accuracy which could not be guaranteed and could bias the results. Second, it did not record important data such as the diagnosis date, clinical details, and laboratory and radiological findings, limiting the appropriate retrospective or prospective evaluation of patients from disease onset and the assessment of correlation between health events and disease onset or treatment. Furthermore, the hospital administrative claims dataset did not permit a comprehensive evaluation of patients' baseline clinical history, disease stage, disease severity, disease progression, and appropriate establishment of the line of therapy due to the unavailability of the aforementioned data. Third, in several instances, ICD-10-GM codes were not precise enough to permit the appropriate evaluation of certain disease conditions or treatment regimens. For example, to identify underweight patients, the ICD-10-GM codes R63.4, abnormal weight loss; R63.6, insufficient intake of food and fluid; and R64, cachexia were used, but none was sufficiently precise to identify underweight cases. The newer version of ICD codes, ICD-11-GM, is more granular and contains a unique code for identifying body mass index-related conditions in adults (ICD-11-GM 5B54, 5B81) [34]. Finally, data required for evaluating health resource utilization, including outpatient and emergency department visits or visits to other departments or even different hospitals, were not recorded. Therefore, the exact disease duration since its first onset could be underestimated, and episodes before the first recorded admission would not be captured. Hence, we could not comprehensively evaluate the patients' healthcare resource utilization. Moreover, this single center analysis is limited to a single healthcare facility, rendering comparisons with other centers' datasets or benchmark evaluations impossible. Despite the limited value of the used hospital administrative claims dataset to comprehensively evaluate inpatient management pattern, health resource utilization and quality-of-care in patients with MM, it was able to provide some insight that require future comprehensive analysis. By contrast, single-payer claims data (e.g., SHI) provide data variables that complement those from hospital administrative claims data (e.g., disease onset, health provisions from outpatients and emergency department visits and cross-sector information). However, German administrative claims data, in general, lack clinical details on disease stage/severity, response to treatment, laboratory results, and possibility to distinguish between disease-related and/or treatmentrelated complications. Therefore, researchers aiming at addressing any of these aspects will have to supplement data by utilizing other data sources such as patient medical records, pharmacy records, laboratory files and health insurance databases for intersectoral analyses.
Our observations of the limitations of the hospital administrative claims dataset in terms of adequately evaluating clinical characteristics and disease progression agree with previous reports [5,15,18]. However, the recorded information in our dataset on therapeutics and diagnostic procedures provided to patients during hospitalization allowed us to evaluate the treatment pattern in the SCT group. Similarly, Kreis K. et al. reported the limitations of German claims data to evaluate treatment patterns and assess treatment discontinuation/switching due to missing clinical details [14]. Our results are also consistent with previous studies on the limitations of administrative claims data in terms of adequately evaluating the incidence of adverse events [29][30][31]. However, the hospital administrative claims dataset provided some quality indicators, e.g., infections, readmissions, and platelet transfusion rates among SCT patients, signaling possible adverse events that require further evaluation. Such indicators are essential for evaluating the economic impact of the disease. Fonseca et al. reported that multiple admissions and treatment-or disease-related complications have some effect on disease financial burden [2]. They are also crucial for healthcare management evaluation within a healthcare facility overtime or for benchmark evaluations, which compare the quality of care among different healthcare facilities, such as guideline adherence and complications post treatment [35,36]. One of the objectives of the BMBF and MII is to support the use of routine-care data in health research and the exchange of data between different German healthcare institutions [37]. Thus, allowing for comprehensive quality-of-care assessments and benchmark evaluation between different healthcare facilities in Germany. We believe that the dataset can be used for quality indicator and guideline adherence assessment either within the hospital or compared with other tertiary hospitals sharing a similar database infrastructure. Future research should consider linking claims datasets to other secondary data sources to rigorously evaluate disease characteristics, treatment patterns, treatment-related adverse events, and economic burden of MM management from broader perspectives. It must also consider evaluating the quality of care concerning complications and number of admissions after a medical intervention or a novel therapy to enable a more in-depth evaluation of treatment effectiveness.

Conclusions
German hospital administrative claims data are an important information source to identify cases, medical events, and outcomes of interest based on predefined criteria in rare conditions such as MM from inpatient settings. Patients with MM identified from the dataset had complications such as infections, which indicate the need for more in-depth evaluations for qualityof-care assessment and benchmark evaluation compared with other healthcare facilities. Furthermore, key elements such as complications, treatment frequency, and readmission rates were available in the dataset, rendering it a useful secondary data source for health service research studies. However, a comprehensive evaluation from both inpatient and outpatient settings of clinical characteristics, management pattern, healthcare resource utilization, and quality of care of patients with MM requires linking hospital administrative claims data to other secondary data sources.
Supporting information S1